Some technical details on confidence intervals for LIFT measures in data mining
نویسندگان
چکیده
A LIFT measure, such as the response rate, lift, or the percentage of captured response, is a fundamental measure of effectiveness for a scoring rule obtained from data mining, which is estimated from a set of validation data. The LIFT measures are related to the ROC (Receiver Operator Characteristic), but there exist some important differences. In this paper, we study how to construct confidence intervals of the LIFT measures. We point out the difficulty of this task and explain how simple binomial confidence intervals can have incorrect coverage probabilities, due to omitting variation from the sample percentile of the scoring rule. We derive the asymptotic distribution using some advanced empirical process theory and the functional delta method in ∗Technical Report 14-02, Department of Statistics, Northwestern University. †Wenxin Jiang is Professor of Department of Statistics, Northwestern University, Evanston, IL 60208 (email: [email protected]); and Yu Zhao is Statistician at Amazon (email: [email protected]).
منابع مشابه
New probabilistic interest measures for association rules
Mining association rules is an important technique for discovering meaningful patterns in transaction databases. Many different measures of interestingness have been proposed for association rules. However, these measures fail to take the probabilistic properties of the mined data into account. We start this paper with presenting a simple probabilistic framework for transaction data which can b...
متن کاملArea specific confidence intervals for a small area mean under the Fay-Herriot model
‎Small area estimates have received much attention from both private and public sectors due to the growing demand for effective planning of health services‎, ‎apportioning of government funds and policy and decision making‎. ‎Surveys are generally designed to give representative estimates at national or district level‎, ‎but estimates of variables of interest are oft...
متن کاملUsing a Data Mining Tool and FP-Growth Algorithm Application for Extraction of the Rules in two Different Dataset (TECHNICAL NOTE)
In this paper, we want to improve association rules in order to be used in recommenders. Recommender systems present a method to create the personalized offers. One of the most important types of recommender systems is the collaborative filtering that deals with data mining in user information and offering them the appropriate item. Among the data mining methods, finding frequent item sets and ...
متن کاملImplications of Probabilistic Data Modeling for Mining Association Rules
Mining association rules is an important technique for discovering meaningful patterns in transaction databases. In the current literature, the properties of algorithms to mine association rules are discussed in great detail. We present a simple probabilistic framework for transaction data which can be used to simulate transaction data when no associations are present. We use such data and a re...
متن کاملOn Mining Fuzzy Classification Rules for Imbalanced Data
Fuzzy rule-based classification system (FRBCS) is a popular machine learning technique for classification purposes. One of the major issues when applying it on imbalanced data sets is its biased to the majority class, such that, it performs poorly in respect to the minority class. However many cases the minority classes are more important than the majority ones. In this paper, we have extended ...
متن کامل